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Abstract. Biochemical processes typically involve huge numbers of individual 
reversible steps, each with its own dynamical rate constants. For example, kinetic 
proofreading processes rely upon numerous sequential reactions in order to guarantee 
the precise construction of specific macromolecules. In this work, we study the transient 
properties of such systems and fully characterize their first passage (completion) time 
distributions. In particular, we provide explicit expressions for the mean and the 
variance of the completion time for a kinetic proofreading process and computational 
analyses for more complicated biochemical systems. We find that, for a wide range 
of parameters, as the system size grows, the completion time behavior simplifies: it 
becomes either deterministic or exponentially distributed, with a very narrow transition 
between the two regimes. In both regimes, the dynamical complexity of the full system 
is trivial compared to its apparent structural complexity. Similar simplicity is likely to 
arise in the dynamics of many complex multistep biochemical processes. In particular, 
these findings suggest not only that one may not be able to understand individual 
elementary reactions from macroscopic observations, but also that such understanding 
may be unnecessary. 

Keywords: Completion time, kinetic proofreading, master equation, Markov process, 
random walk, Laplace transform. 
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1. Introduction 

Considering the ever increasing quantity of known biochemical reactions, one cannot 
help but be amazed and daunted by the incredible complexity of the implied cellular 
networks. For example, just a handful of different proteins can form a combinatorially 
large number of interacting molecular species, such as in the case of immune signaling [Ij , 
where multiple receptor modification sites result in a model with 354 distinct chemical 
species. One must then ask: When do all details of this seemingly incomprehensible 
complexity actually matter, and when is there a smaller set of aggregate, coarse-grained 
dynamical variables, parameters, and reactions that approximate the salient features of 
the system's dynamics? What determines which features are relevant and which are 
not? And if the networks have a simple equivalent dynamics, did nature choose to make 
them so complex in order to fulfill a specific biological function? Or is the unnecessary 
complexity a "fossil record" of the evolutionary heritage? 

In this article, we begin investigation of these questions in the context of certain 
biochemical kinetics networks, namely a reversible linear pathway, a kinetic proofreading 
(KPR) scheme [2j, their combination, and an extension to a much more arbitrary 
multistep completion process. These motifs are common in a variety of cellular 
processes-including DN A synthesis and repair [3l [4] , protein translation ^ i5j , molecular 
transport [6] , receptor-initiated signaling [7l [H [9l [IHl [HI [12] , and other processes-where 
assembly of large biochemical structures requires multiple reversible steps. However, in 
this article, we leave aside the functional behavior of these networks and focus instead on 
a diflFerent question: do these complex kinetic schemes have a simplified, yet accurate 
description? Since multistep structural complexity (see Fig. [T]) is crucial for kinetic 
proofreading, the KPR process is an ideally suited example for this analysis, but our 
conclusions will extend to numerous other complex biochemical processes. 



r r r r 




Figure 1. Schematic description of the model. The process begins at the site 
i = 0, represented with a star. At each site, the process may transition one step to 
the right with the forward rate /c, one step to the left with the backward rate r, or all 
the way back to the origin with the return rate 7. The right-most site, z = L is an 
absorbing site (cloud) at which the process is completed. 

We show analytically and numerically that, over broad ranges of parameters, 
different kinetic schemes exhibit the behavior of either a deterministic process, or a 
single-step exponential- waiting-time process. We also propose intuitive arguments for 
the result, which leads us to believe that similar simplifications of complex behavior 
may be wide-spread, and even universal. We support this conjecture by numerically 
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studying a few more complex systems, but leave a general mathematical proof of this 
conjecture to future work. 

1.1. The Model 

For this study we begin with a general KPR (gKPR) model [2], for which many 
properties can be computed analytically. The model is represented by the Markov 
chain in Fig. [T] At time t = 0, the dynamics begins at the point represented by the star 
(i = 0). The process can leave this state at some exponentially distributed waiting time, 
defined by a forward rate /c, and the process can continue in the forward direction with 
rate k until it reaches the final absorbing point (cloud) at i = L. At each interior point, 
zG{1,2,...,L— 1}, the process can also move one step to the left with a backward rate 
r or all the way back to the origin with a return or proofreading rate 7. The forward 
and the backward rates emphasize the reversibility of all reactions, and the return rate 
corresponds to a catastrophic failure, after which the whole process must start anew. For 
example, in immune signaling, 7 would represent the rate of receptor-ligand dissociation, 
which destroys receptor cross-linking and prevents future forward events for a relatively 
long period of time [Ij. 

This model is substantially simplified compared to detailed models of real biological 
processes [Ij in that, in nature, all three rates may depend on i, and the nodes may 
not form a single linear chain. Even so, the detailed understanding of this simplified 
model provides an excellent starting point in the process of understanding these more 
complicated systems. Indeed, we will also show here that all qualitative conclusions 
made for the gKPR scheme also hold in numerical studies of more complicated systems 
in which rates are site dependent and where the connections of the nodes are much more 
varied than a simple linear chain. 

1.2. The Relevant Features 

To determine if a kinetic model can be well approximated by a simpler one, we must 
first decide which of its features must be retained. To illustrate this question, consider 
the activation of a signaling cascade by an extracellular ligand (as represented in Fig. 
[1]). The ligand binding initiates the process, bringing it from state i = to state i = 1. 
With the exception of this transition, the extracellular environment does not aflFect 
the process. Similarly, the downstream signaling pathways are only aflFected when the 
signaling construct attains its fully activated state at i = L. Thus, as far as the rest of the 
cell is concerned, only the times of process initiation and completion are controllable, 
observable or otherwise important. That is, the system can be characterized by the 
distribution of the first passage or the escape time between the release at i = at t = 
and the completion at i = L ^13\. Analysis of this distribution and showing its very 
simple limiting behavior is the main contribution of our work. 

We note that, even though a lot is known about the first passage times in different 
scenarios [I3l fT4| fT5] and about temporal dynamics of KPR schemes [HI \T0\ fTTJ, to 
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our knowledge, the distribution of the first passage time for KPR type process has not 
yet been analyzed rigorously and little is known regarding how this first passage time 
depends upon biochemical parameters such as system size and reaction rates. 

2. Results 

In the following subsections, we provide precise analyses of three different cases of 
the gKPR scheme depicted in Fig. [T} each corresponding to a different continuous 
time / discrete space Markov chain with exponential transition times (our results can 
be generalized to the case of non-exponentially distributed transition times using the 
methods of [TT]). First is a normal random walk process (that is 7 = 0) with an absorbing 
boundary at z = L and a refiecting boundary at z = 0. This model is denoted as the 
transmission mode (TM) process [l3j. The second model is the directed KPR (dKPR) 
scheme where (fc > 0, r = 0, 7 > 0). The third model is the full gKPR process, where 
all rates are non-zero. For each model, we provide exact solutions for the escape time 
distributions in the Laplace domain and explicit expressions for the mean and variances 
of the escape times (see also derivations in Materials and Methods) . By considering the 
squared coefficient of variation, CV^, for these processes (see Figs. 3 and 5), we explore 
how these distributions change as the system parameters are adjusted and expose the 
fact that all three processes exhibit similar, yet not identical, behavior. In particular, we 
find that all three processes exhibit sharp transitions from near-deterministic (CV^ <C 1) 
to exponential (CV^ = 1) completion times as the critical parameters change, but 
that the actual location of this transition differs between the TM and KPR processes. 
Furthermore, we observe that all these processes have the same limiting behaviors on 
either side of the transition, and that the transition from one behavior to the other 
becomes sharper as the system size increases. Finally, in Subsections 2.5 and 2.6, we 
also numerically explore the same first passage time properties for more complicated 
cases where the reaction rates are site dependent, and where more complicated reaction 
events are possible. For these processes, we again observe the same simplifying behavior 
in the process dynamics and sharp transitions that depend on the size of the system 
(see Figs. 8 and 9). 

2.1. Transmission Mode (TM) 

For the TM process, in which the forward and backward rates {k and r) are non-zero, 
one can derive explicit expressions for the mean and the variance of the first passage 
time (see Materials and Methods: Transmission Mode). Defining 9 = r/k^ these can be 
written: 



1 L - (L + 1)^ + 0^+1 

k (1-0)2 



(1) 




(2) 



CV^ 



TM 



(3) 
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Figure 2. Effect of changing = r/k and L on the first passage time 
distribution for the TM process. The time been rescaled for each curve as r = 
(A) First passage time distribution for different values of the backward rate, r, and 
a fixed length L = 8. Here r ranges from k/A to 4A:, as denoted in the boxes. The 
two dashed lines correspond to the limiting cases, ^ = 0, oo (F-distribution and an 
exponential, respectively). (B,C) Effect of changing the length L on the escape time 
distribution (B) for = 0.5 and (C) for = 1.1. For ^ < 1, the limiting behavior as 
L ^ oo is a delta function; for ^ > 1, the limiting distribution is the exponential. 



where CVtm is called the coefficient of variation. For a deterministic process, CV = 0, 
and for an exponentially distributed one, CV = 1. This makes the coefficient of variation 
a useful property characterizing a distribution. 

Fig. [2]A.-C shows the effects that changes in the parameters 9 and L have on the 
distribution of the escape time. In order to show the distribution for diverse parameters 
simultaneously, time has been rescaled by the mean /x for each curve, r = t/jn. This 
leads to the probability density /(r) = fJ^f{t). Fig. [2]A. shows that, for a fixed L, as 9 
increases, the distribution becomes broader and approaches an exponential distribution, 
while as 9 decreases the distribution approaches a F-distribution, F(L, l/k). In order 
to quantify these behaviors we provide the trends of the mean and the coefficient of 
variation for the corresponding regimes. 

/r m f ^"^"'A for e>2, , ^ 

cv^ a^)-i for^»f^, 

^^™^^'^^^| 1/L fore«^. 

It is worth mentioning that 9 = 1 means an unbiased random walk, while ^ < 1(> 1) 
means a walk biased towards the entry (exit) point. 

Figs. [2]B, C show that changes in L have different effects on the escape time 
distribution depending upon the value of 9. When < 1, the limiting distribution 
as L becomes large is a 5-function at t = L/[k{l — 0)], whereas for > 1, the limiting 
distribution is an exponential with /xtm = 9^^^/[k{l — 9^]. 

Fig. [3] iUustrates the effect that changes in L and 9 have on //tm and CV^j^, as 
given by Eqns. ([ll [3]). It is of particular interest to examine these as the chain becomes 
long. From Eq. O), we see that, as L increases, CV^m converges point- wise to the step 
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A Mean First Passage Time B Coefficient of Variation 
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e = r/k e = r/k 

Figure 3. Effect of changing the length and backward rate, r, on the 
mean (A) and squared coefficient of variation (B) of the TM process first 
passage times. The curves have been computed using Eqns. ([l] |3| and are plotted 
for increasing values of L = {1, 2, 4, 8, 16, 32}. 



function: 

l,mCVk,(L,0)=. (6) 

Numerical analysis of Eq. [s] around = 1^ shows that the maximum slope of CV^m (to 
leading order in L) occurs at a point that approaches ^ = 1 at a rate: 

The slope at 6 = 1 - 21/(21.2) jg. 

Thus for a given large L, the range of 9 over which the first passage time changes from a 
narrow F-distribution to a broad exponential distribution is centered just left of = 1, 
and it becomes increasingly narrow as L increases. 

2.2. Directed Kinetic Proofreading (dKPR) 

For the dKPR process, the system can return directly to the origin with rate 7 > 0, 
but the backward rate, r, is zero. Then, defining = the mean and the coefficient 
of variation of the first passage times are (see Materials and Methods: Directed Kinetic 
Proofreading) : 

1 



MdKPR — TT 

kip 



[i + i^f-i , (9) 



_ (l + t/^f^-2^L(l + V;)^-^-l 

^VdKPR- (i + ^)2L_2(l + ^)L + l ' ^^^> 

Fig. |4]A.-B shows the effects that changes in ip and L have on the distribution of the 
waiting times for the dKPR process. As in the previous section, time has been rescaled 
by /i for each curve. For a fixed L, as ip increases, the distribution again approaches 
either an exponential distribution or F-distribution for 7 ^ oc, 0, respectively. Unlike 



Simplification of Biochemical Completion Times 



7 




0.5 1 1.5 2 2.5 0.5 1 1.5 2 

Time, r = t/fi Time, r = t/ ji 



Figure 4. Effect of changing ijj = 7//C and L on the first passage time 
distribution (normahzed by its mean) for the dKPR process. (A) The first 
passage time distribution for different values of the return rate, 7 and a fixed length 
L = 8. The parameter ranges from 1/64 to 1 as denoted in the figure. The two 
dashed lines correspond to the limiting cases, where = 0, 00. The former results in 
a F-distribution, and the latter in an exponential distribution. (B) Effect of changing 
the length L on the first passage time distribution for = 1/8. For any value of > 0, 
the limiting behavior as L ^ 00 is an exponential distribution. 



for the TM process, the limiting distribution as L ^ oc is exponential for any value of 

In Fig. [5| we illustrate the dependence of /idKPR and CV^j^pj^ on L and ip. From 
Eqns. ([9, 10), their limiting behaviors are: 

fr i\ ) ^^"V^ forV^>L, 



^V,kpr(^,V^)^<; for^«3/L2. ^^^^ 



Furthermore, as L grows, the coefficient of variation tends to converge point-wise to a 
step function at ^/^ = 0: 

V Jo for = 0, 

iim^CV,KPK=| , fo^^^o (13) 

As in the TM process, this convergence can be studied by examining the maximum 
slope of the coefficient of variation. Since the second derivative of CV^^pj^ is always 
negative for > 0, this maximum slope occurs at ^0 = 0. Taking the derivative of Eqn. 



10] at the point 7/^ = yields an exact expression for the maximal slope, 
max 



(14) 



di\) dip 

These trends are readily apparent in Fig. [5]B, where as L or increase, CV^ approaches 
unity. 
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A Mean First Passage Time 




'0 = 



B Coefficient of Variation 




Figure 5. Effect of changing the length and the proofreading rate, 7, on the 
mean (A) and the squared coefficient of variation (B) of the escape time 
for the dKPR system. The curves have been computed analyticahy using Eqns. (lol 



10) and are plotted for increasing values of L = {1, 2, 4, 8, 16, 32}. 



2.3. Comparison between the TM and the dKPR models 

The TM and the dKPR processes exhibit very similar behaviors in their first passage 
time distributions: for a fixed large L, increases m 6 01 result in sharp transitions 
from deterministic to exponential completion times. Moreover, the two processes have 
quantitatively the same limiting behaviors on either side of the transition: the means 
and the CVs are asymptotically the same functions of 9 and ^ [cf. Eqs. g § [lT| |l2|)]. 

However, the similarity between the limits of both processes is not exact. For the 
TM, the deterministic-to-exponential transition (defined by the point of the maximum 
slope of CV^) is near 9 = 1^ approaching it as L grows [cf. Eq. ([7|], while the same 
transition for the dKPR is always at t/^ = 0. Moreover, although for both models the 
width of the transition region, as defined by the maximum slope of CV^, is inversely 
proportional to the system size (for L ^ 1), the width is 15/4 times larger for the TM 
process. Finally, while the small/large 9 and ip limits are the same in both models, the 
terms small and large themselves have diflFerent meanings. In particular, for the TM 
model the meanings are effectively independent of the system size (Eqn. [5]), while for 
the dKPR model the meanings strongly depend on L (Eqn. [T2| ). 



2.4' Ceneral Kinetic Proofreading (gKPR) 

In the most general case, both r > 0, and 7 > 0. Still, one can derive explicit expressions 



for the mean and variance of the first passage times (see Eqns. 39, 40 in Materials and 
Methods). Fig. [6] iUustrates the probability distribution for the exit times of the gKPR 
process for different 0, and L. Based upon the previous results, it is no surprise that 
the escape time distributions converge to an exponential distribution as t/^ or are large 
(cf. Fig. [6]A., B), or to a F-distribution when ^0 = = 0. It is also not surprising that 
the gKPR first passage time distribution converges to an exponential distribution when 
7 > and L is large (cf. Fig. [6]C). What is surprising is how neatly the two constituent 
processes, TM and dKPR, combine to define the trends of the gKPR process. 

Figs. [7]A.-D show the mean and the coefficient of variation of the first passage time 
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Time, r = t/ fi time, r = t/ ii Time, r = 



Figure 6. The escape time probability density function for the gKPR 
scheme. (A) = 7/A: = 1/8, L = 8, and variable oi = r/k. (B) = 1/2, 
L = 8 and variable ?/^. (C) ^ = 1/2, = 1/8, and variable L. In all cases, the limiting 
behavior is an exponential as L, 0, or grow. 




(9 = r/fc e = r/k 

Figure 7. Effects of parameter variation on the escape time distribution 
for the gKPR process. (A) Mean completion time versus and for L = 8. (B) 

Coefficient of variation, CVg^pR versus and ip for L = 8. (C, D) the same for L = 16. 

distributions for this process under various conditions. In panel A, we plot /XgKPR as a 
function of 9 and for a fixed system size of L = 8, and panel B shows the corresponding 

KPR- Panels C and D show the same information, but for L = 16. We see that the 
general trend for the increase in the mean passage time and the convergence of the 
CV^ are determined in the same manner as those for the TM and dKPR processes. In 
particular, we find that that the contour lines for both /XgKPR and CVgj^pj^ are almost 
linear. However, this linearity is not exact-the actual contour lines for /XgKPR('0, 0) are 
slightly concave and the contour lines for CVgj^pj^('0, 9) are slightly convex. From Figs. 
[3] and [5] above, we see that changes in L have a large eflFect on the first passage time of 
the TM and dKPR processes particularly around 9 = 1 and ^0 = 0, respectively. In the 
gKPR process, these eflFects correspond to changes in the endpoints, and therefore the 
slopes of the contour lines in Fig. [7]A.-D. 

With explicit expressions for the mean and coeflBcient of variation, one can again 
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examine their limiting behaviors for growing and 9. In particular, we find that these 
are equal to those of the TM and the dKPR models when ^ oc or ^ oc, respectively. 
Further, if L is large and > 0, the mean first passage is: 



lim MgKPR - ^ I 1 + 1 , (15) 



where 




l+O = ^ ^ > 1. (16) 



2 

t2 



Further, the coefficient of variation, CVg^pR approaches unity for all values except 
when '0 = and ^ < 1 , and 



1 - 77^ foi- 7/; > 2L and e > 4, 



^"^^""^ ' ^ I 1/L for f ande< |. ^ ^ 

This shows that, for large proofreading and backward rates, the two effects have equal 
inffuences on the distribution of the completion time. However, one should bear in mind 
that, again, the meaning of small/large O^ip is different. 



2.5. Kinetic Proofreading with Site- Dependent Rates 

The previous subsections have shown that the TM, dKPR and gKPR processes all 
exhibit a similar simplification of behavior when all rates are the same at every 
intermediate state in the process. In reality, these rates may vary from one site to 
the next since each transition may correspond to a different physical reaction. In the 
case of the dKPR, one can still derive expressions for the first passage time distributions 
(see Materials and Methods), and in the case of more complicated processes, one can 
explore these distributions numerically. To illustrate the effects of such variation, we 
have numerically explored a gKPR process where every rate is different, but chosen from 
some relatively broad lognormal distribution. Fig. |8] shows how such site dependent 
rates affect the coefficient of variation for the gKPR process. Here all forward and 
backward rates, {r^,/c^,7^}, have been generated from the same distribution, and then 
the backward rates {7^, r^} have been scaled uniformly by a parameter, a that has been 
used to adjust the bias from completely forward a = to backward a ^ 0. From 
Fig. |8] we see once again that there is a sharp transition from when the coefficient of 
variation is small at a = to when the coefficient of variation is near one when the bias 
is backward. As in the previous systems, this transition depends upon on the length 
of the system — longer lengths correspond to sharper transitions. Furthermore, as the 
lengths increase, variation in the parameters appears becomes less important as can 
be seen by the comparing the variation in the curves corresponding to L = 100 (blue 
curves) to those for a smaller length of L = 40 (black curves). 
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Figure 8. Coefficient of variation for a gKPR process with random 
parameters versus backward to forward bias. The length of the process is 
either 40 (black lines) or 100 (blue lines) and all rates /c^, ji/a and Vi/a are taken 
independently from the lognormal distribution shown in the inset. The three panels 
correspond to three increasingly narrow distributions for the parameters. 



2.6. Multiple Leap Completion Processes 

In addition to the gKPR scheme illustrated by Fig. [T} we also explore a much more 
general set of multistep completion processes where reactions can take the system not 
just one, but many steps toward the completion state or toward the initial state. In 
terms of chemical processes, these multiple step jumps could correspond to additions or 
removals of different multi-molecular complexes rather than just individual molecules. In 
this case there are now many different interconnected pathways by which the process can 
travel from state i = to i = L. In such systems, the master equation, dP/dt = A-P(t), 
has an infinitesimal generator, A given by A = aB+F, where the "backward" matrix, B 
is upper-triangular and represents reactions that allow the system to return an arbitrary 
number of states backwards with certain site-dependent rates, and the "forward" matrix 
F is a lower-triangular banded matrix, which allows for different forward jumps of lengths 
m < L, again with site-dependent rates. Since m is constrained to be less than L, there 
is always a minimum of about L/m jumps necessary to complete the process. 

In the expression of the infinitesimal generator, a controls the bias, and we show 
once again that there is a sharp threshold between an almost deterministic and an 
exponential behavior as a grows. For this arbitrary process, we have randomly generated 
hundreds of realizations each with different site-dependent rates taken from a broad 
lognormal distribution, and we find that for every such parameter set, there is a 
sharp transition from a narrow "deterministic" to a broad exponential waiting time 
distribution as can be seen in Fig. [9} Furthermore, despite drastic differences in the 
randomly chosen parameters, we find that the dynamical behaviors of the systems are 
so close that it is difficult to distinguish one parameter set from the next based solely on 
the waiting time. Finally, we find the same dependence of this transition on the size of 
the system as has been observed for dKPR and gKPR processes (compare the process 
with 40 steps (black lines) to the process with 100 steps (blue lines) in Fig. [9]). 
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Figure 9. Coefficient of variation for an arbitrary kinetic proofreading fike 
process with random parameters versus backward to forward bias. The 

master equation for this process is P{t) = (aB + F)P(t), where F is banded such 
that the system can move 1, 2, or 3 steps forward in a single jump, and B is upper 
triangular such that the process can move any number of steps backwards. The length 
of the process is either 40 (black lines) or 100 (blue lines), and each non-zero element of 
F and B is randomly chosen from the lognormal distribution plotted in the inset. The 
three panels correspond to three increasingly narrow distributions for the parameters. 



3. Discussion 

The results for the coefficient of variation of the escape time distribution, as weU as 
the shapes of the distributions themselves, clearly show that the kinetic proofreading 
process and other multistep completion processes have two simple limiting behaviors 
as the system size increases. First, when the overall bias is forward, the completion 
time becomes narrowly distributed. Second, when the overall bias is backward, the 
escape time distribution approaches an exponential. Both of these behaviors are 
substantially simpler than one could have expected from the the original complex 
kinetic diagram, implying that the observable behavior of this complex system can 
be approximated accurately by a single-parameter equivalent, corresponding either to a 
deterministic reaction or a simple two-state Markov chain. Interestingly, the approach 
to the deterministic regime as the system size grows is well understood (see, for example, 
[18] on the discussion of this effect in the context of reproducibility of responses of rod 
cells to single photon capture events). However, the exponential regime has not been 
explored extensively before, even though it is the more robust of the two, emerging for 
any ip > 0. 

Both limiting behaviors of these systems are explainable by simple intuitive 
arguments. First, a system with a forward bias completes the entire process in a 
certain characteristic time, and the relative standard deviation of this time scales as 
1/v^number of steps, as is always the case for the addition of independent identically 
distributed random variables. In the opposite case, the backward bias ensures that the 
process repeatedly returns to the initial state, from which many independent escape 
attempts are made. Due to the independence, the number of such attempts before a 
success has a geometric distribution (the discrete analog of an exponential distribution) , 
and its form effectively defines the first passage time distribution. In other words, the 



Simplification of Biochemical Completion Times 



13 



system tries to climb out of a free energy well (with the ground state near the entry 
point), and escape times in such cases are usually exponentially distributed. 

Although the KPR models most rigorously analyzed here are relatively simple linear 
chain processes with site-independent transition rates, our numerical studies strongly 
suggest that the conclusions we make generalize to more complicated systems. We have 
shown numerically that our conclusions do not change when the kinetic rates fc, r, 7 
are site-specific and/or when the reactions allow for certain states to be skipped and 
for there to be many different interconnected pathways by which the process may be 
completed. Similarly, if biochemical processes involve multiple independent pathways, 
each with exponential/deterministic waiting times, then the first of these pathways 
to complete will also be exponential/deterministic. Furthermore, first passage times 
for higher dimensional random walks also frequently exhibit simplified dynamics, as has 
been shown via reductions to a stochastic model of the genetic toggle switch [19j . Finally, 
the "free energy well" argument says that the overall bias of a system's motion will 
control the choice between the exponential (Markovian) and the deterministic behaviors 
even for more complex systems. In particular, it is clear that any KPR-like system, 
where a strong backward bias is required to undo potential mistakes, is likely to fall in 
the exponential escape time distribution regime. 

Given that so much structural complexity is used to achieve a very simple dynamics 
in these processes, it is natural to ask why the complexity is used at all. One 
hypothesis is that such agglomeration of multiple independent kinetic parameters into 
a few coarse-grained variables means that multiple chemotypes can result in the same 
phenotype. Thus, the system possesses many situationally sensitive knobs with which it 
can compensate for environmental changes and maintain a few simple behaviors. Such 
adaptive fiexibility has been observed in a variety of contexts [201 EH l22] • An alternative 
hypothesis may be that these extra elements are vestigial network components to which 
the cell is insensitive in its current evolutionary or developmental situation. The current 
work provides a starting point to evaluate these possibilities via parametric sensitivity 
analysis. 

Finally, the fact that the KPR process, as well as many others, has such simple 
limiting behaviors has important consequences for the modeling of biochemical systems. 
The bad news is that it is unreasonable to hope to characterize individual molecular 
reactions with observations of the input-to-output responses — many different internal 
organizations will result in equivalent observable behaviors. The good news is that, 
when attempting to understand such processes in a wider cellular context, it is often 
unnecessary to explicitly treat every individual step-a coarse-grained model with only 
a handful of aggregate parameters may be sufficient. This result clearly explains why 
simple phenomenological Markovian reaction rate models of complicated processes, such 
as transcription, translation, enzyme activation and others, have had such a great success 
in explaining biological data. 
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Materials and Methods 



Preliminaries 



Let the vector p = [po{t)^pi{t)^ . . . ,PL(i)]^ denote the probabihties of each state in the 
kinetic diagram in Fig. [TJ This distribution evolves according to the Master Equation 
(ME), which can be written: p(t) = Ap(t), where the infinitesimal generator matrix A 



is: 



A 



—k for i = j = 0, 

—k — 7 — r for < i = j < L — 1, 

7 + r for (ij) = (0,1), 

7 for z = and 2 < j < L — 1 , 

r for i = j — 1 and 2 < j < L — 1, 
for z = j + 1 and 2 < j < L — 1 , 

everywhere else. 



(18) 



By applying the Laplace transform. 



Pi(s) 



Pi{t)e-''dt, 



(19) 



one can convert the ME to a set of linear algebraic equations: 

(5-A)P(5)=p(t = 0)=eo. (20) 

Note that this equation includes the specification of the initial condition, pi {t = 0) = 5i^o^ 
where 6 is the Kronecker delta. 

We now construct a general solution for this equation in the form 

P,{s) = CiX\ + C2Xl (21) 
Inserting this into the expression for < i < L — 1, one finds that the space-independent 
parameters Ai^2 satisfy 

+ X' - A = 0. (22) 

s + A: + 7 + r s + k + ^ + r 
Similarly, the coefficients Ci and C2 must obey the equations for Po(s) and Pl-i{s) in 



(20), which can be written as 

(s + k) (Ci + C2) = 1 + r (CiAi + C2X2) 



CiAt 



L-l 



C2X2 ^ 



1 - 

T^Ai 
k 



+ C2 



1 



X^ 



I-A2 
(CiAf-2 + C2Xt') 



5 + /c + r + 7 

where we have applied the geometric series identity, J2i=i ~ 



(23) 
(24) 



1-A 



1. 



Since PL{t) is the cumulative probability that the system has reached the absorbing 
state, the first passage time probability density, f{t) = dpL{t)/dt^ can be written in the 
Laplace domain as: 

F{s) = kPL-i{s). (25) 
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Once this quantity is known, all uncentered moments of the escape time are easily 
derived as 

oo 

,m dFis) 



ds 







(26) 



s=0 



With this in mind, we now consider the three special cases in the following subsections. 

Transmission Mode 

The first case to be considered is transmission mode: the continuous time, discrete 
space random walk, where the process can only move forward or backward to its 



nearest neighbor. Applying the boundary conditions as expressed in Eq. (24) yields 
the expressions for Ci and €2'- 



1 



{s + k - r\2) 







A2-1 




Ai-1 





, and C2 —Ci 



At 

Ar 



(27) 



where Ai and A2 are obtained from Eq. (1221): 



Al,2 



s + k + r±\/{s + k + r) — Akr 
~2r ' 



(28) 



Following simple algebra, the Laplace transform of the first passage time probability 
density function (PDF) then becomes 

Ai~ 



F{s) = CikX^ 



L-l 



1 - 



A. 



(29) 



from which all moments of the first passage time can be extracted. In particular the 
mean escape time and the variance are given by Eqs. ([l} [2]) in the main text. 

Directed Kinetic Proofreading 

The second case we consider is that of directed kinetic proofreading, in which the 
backward transition rate is neglected, r = 0, but the return rate is non-zero, 7 > 0. In 
this case the solution is much simpler and can be written as 



Ms) = CiX\ 



where A is the single root of Eq. (22) given by 

k 



A 



5 + A; + 7' 
and the coefficient Ci is reduced to 



(30) 



(31) 



(32) 
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In this case, the Laplace transform of the first passage time is given by 

k 



f{s) = kpL-iis) 



A 



L-l 



(33) 



which gives the expressions for the mean escape time and its coefficient of variation as 
m Eqs. g [10|) in the main text. In the case of site-dependent rates, one can stiU derive 
an expression for the Laplace transform of the completion time, which can be written 
as: 

L-l 



kj-i 



f{s) = 



\s + kj+ -fj 



L-l 



(34) 



1 + 



General Kinetic Proofreading 



In this case, all the rates k, 7, and r are non-zero, and Eq. (22) has two solutions 
s + k + r + ^± y^(s~+"fc~+7H^7y^^^fcr 



Al,2 



2r 



(35) 



By applying the boundary conditions in Eq. (24), we obtain the expressions for Ci and 

C2: 

1 



r(A2-l)-7H7+VA. 

L 



with which one can define the Laplace transform of the first passage time PDF: 



F{s) = CikX^-' 1 - 



Ai 

A2 



(36) 
(37) 

(38) 



Once again, it is possible to derive the the mean and variance of the escape time in this 
scheme 



1 



/^gKPR 



2kip 



{il - 6^ + {il + 1) 6^-2 



(39) 



where l± are defined as in Eq. (16). The first passage time variance in this case is given 
by 

feW^KPR = \0'' {t' + If) - 1 

02L-1 ^Q_l_^^ (^12L _ ^2L^ + 21^0^^-^ (/^ - 1%) 



+ 



2 (/+-/-) 

26 -L (/^ + (-^ + 1 + lP) ^^-2 - (/2L + /2L^ 

(/+ - l-f 
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_ 2^ (1 - 0^-') 26^-^^ - 1 + ^) - 1%) 

{h-l-f {h-l-f ■ ^ ^ 
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